Climbing the Tower of Babel: Unsupervised Multilingual Learning
نویسندگان
چکیده
For centuries, scholars have explored the deep links among human languages. In this paper, we present a class of probabilistic models that use these links as a form of naturally occurring supervision. These models allow us to substantially improve performance for core text processing tasks, such as morphological segmentation, part-of-speech tagging, and syntactic parsing. Besides these traditional NLP tasks, we also present a multilingual model for the computational decipherment of lost languages.
منابع مشابه
UMCC_DLSI: Reinforcing a Ranking Algorithm with Sense Frequencies and Multidimensional Semantic Resources to solve Multilingual Word Sense Disambiguation
This work introduces a new unsupervised approach to multilingual word sense disambiguation. Its main purpose is to automatically choose the intended sense (meaning) of a word in a particular context for different languages. It does so by selecting the correct Babel synset for the word and the various Wiki Page titles that mention the word. BabelNet contains all the output information that our s...
متن کاملChapter 4 Character encoding in corpus construction
Corpus linguistics has developed, over the past three decades, into a rich paradigm that addresses a great variety of linguistic issues ranging from monolingual research of one language to contrastive and translation studies involving many different languages. Today, while the construction and exploitation of English language corpora still dominate the field of corpus linguistics, corpora of ot...
متن کاملAre We Moving Toward an Information SuperHighway or a Tower of Babel? The Challenge of Large-Scale Semantic Heterogeneity
متن کامل
2016 BUT Babel System: Multilingual BLSTM Acoustic Model with i-Vector Based Adaptation
The paper provides an analysis of BUT automatic speech recognition systems (ASR) built for the 2016 IARPA Babel evaluation. The IARPA Babel program concentrates on building ASR system for many low resource languages, where only a limited amount of transcribed speech is available for each language. In such scenario, we found essential to train the ASR systems in a multilingual fashion. In this w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010